I want to start off by saying thank you for taking time out of your undoubtedly busy life fielding SEO’s most burning questions. That being said, I beg you to please stop answering questions via video. I’ve been chained to my desk for over a day now, and my skin has been rubbed raw from the cuff around my ankle (and from my unsuccessful attempts to gnaw off my own foot). You don’t know what Rand Fishkin is really like. He made it explicitly clear that if I don’t keep summarizing your videos, I won’t live past my birthday. Please…I’m just a poor college grad who unknowingly got thrown into the dark and sordid world of SEO. If you insist on continuing to podcast your responses, then at least send help…and a sandwich. No mustard, please.
Regards,
Rebecca
Now, onto Part 3 of the Matt Cutts Video Anthology…..
This segment begins with Matt sipping from a can of pop. He looks up and exclaims, “I was just enjoying some delicious Diet Sprite Zero while reading my new issue of Wired Magazine. Boy, they really captured the asymmetry of Stephen Colbert’s ears, didn’t they? I think it would be fun to do fake commercials. Diet Sprite has not paid me anything for endorsing them.”
1.
Does Google Analytics play a part in SERPs?
To the best of my knowledge it does not. I’m not gonna categorically say we don’t use it anywhere in Google, but I was asked this question in Webmaster World Las Vegas last year, and I pledged that the webspam team would not use Google Analytics data at all. Now, webspam is just a part of quality, and quality is just a part of Google, but webspam definitely hasn’t used Google Analytics data to the best of my knowledge. Other places in Google don’t either, because we want people to just feel comfortable using it and use it.
2.
When does Google detect duplicate content, and within which range will duplicate be duplicate?
Not a simple answer. The short answer is we do a lot of duplicate content detection. It’s not like there’s one stage where we say okay, right here is where we detect the duplicates; rather, it’s all the way from the crawl through the indexing through the scoring all the way down…just milliseconds before you answer things.
There are different types of duplicate content. There’s certainly exact duplicate detection. So if one page looks exactly the same as another page, that could be quite helpful. But at the same time, it’s not the case the pages are always exactly the same. So we do also detect near duplicates, and we use a lot of sophisticated logic to do that.
In general, if you think you might be having problems, your best guess is to make sure your pages are quite different from each other. We do a lot of duplicate detection to crawl less and to provide better results and more diversity.
3. I’d like to explicitly exclude a few of my sites from the default moderate safe search filtering, but Google seems to be less of a prude than I’d like to prefer. Is there any hope of a tag, attribute, or other snippet to limit a page to unfiltered results, or should I just start putting a few nasty words in the alt tags of blank images? (Matt really likes this question)
Well, don’t do them in blank images. Put them in your meta tags. Whenever I was writing the very first version of safe search, I noticed that there were a lot of pages which did not tag their sites or their pages at all in terms of “We are being adult content”. There’s a lot of industry groups/standards, but at that time the vast majority of porn pages just sort of ignored those tags, so it wasn’t all that big of a win to just go ahead and include that.
A short answer to your question is, to the best of my knowledge, there’s no tag that can just say “I am porn. Please exclude me from your safe search.” It’s wonderful that you’re asking about that. Your best bet—I would go with meta tags because safe search, unlike a lot of stuff, actually does look at the raw content of a page, or at least the version that I last saw looks at the raw content of a page. If you put it in your meta tags or even in comments, which is something that isn’t usually indexed by Google very much at all, we should be able to detect it as porn that way. Don’t use blank images. Don’t use images that people can’t see.
4. Sometimes I make a box spiderable by just putting links in the option elements. Normal browsers ignore them and spiders ignore the option. But since Google is using the Mozillabot and the bot renders the page before it crawls it, I know that if the Mozilla engine renders the element, who will remove the element from the document object model tree? (He’s saying “Can I put links in an option box?”)
You can, but I wouldn’t recommend it. This is pretty nonstandard behavior. It’s very rare. It would definitely make my eyebrows go up if I were to see it. It’s better for your users and it’s better for search engines if you probably just take those links out and put them somewhere at the bottom of a page or on a site map, and that way we’ll be able to crawl right through and we don’t have to have hyperlinks or anything like that.
5. I’ve mentioned before that I’d love to see you do a “define” type post where you define terms that you Googlers use that we non-Googlers might get confused about—things like data refresh, orthogonal, etc. You may have defined them in various places, but one cheat sheet type list would be great.
A very good question. At some point I’ll have to do a blog post about hosts vs. domain and a bunch of stuff like that.
Several people have been asking questions about June 27th, July 27th, so let me talk about those a little bit in the context of a data refresh vs. an algorithm update vs. an index update. I’ll use the metaphor of a car.
Back in 2003 we would crawl the web and index the web about once every month, and when we did that, that was called an index update. Algorithms could change, the data would change, everything could change all in one shot. So that was a pretty big deal. Webmaster World would name those index updates. Now that we pretty much crawl and refresh some of our index every single day, it’s ever flux, it’s an always going on sort of process.
The biggest changes that people tend to see are algorithm updates. You don’t see many index updates anymore because we’ve moved away from this monthly update cycle. The only times you might see them is if you’re computing an index which is incompatible with the old index. For example, if you change how you’d use segmentation of CJK, Chinese, Japanese, or Korean, something like that, you might have to completely change your index and build another index in parallel. Index updates are relatively rare.
Algorithm updates basically are when you change your algorithm. So maybe that’s changing how you score a particular page. You say to yourself, “Oh, the page rank matters this much more or this much less”, things like that. And those can happen pretty much at any time, so we call that asynchronous because whenever we get an algorithm update and it evaluates positively, it improves quality, it improves relevance, we go and push that out.
The smallest change is called a data refresh, and that’s essentially you’re changing the input to the algorithm, you’re changing the data that the algorithm works on.
An index update, with the car metaphor, would be changing a large section of the car, things like changing the car entirely. An algorithm update would be things like changing a part in the car, maybe changing out the engine for a different engine, or some other large part of the car. A data refresh is more like changing the gas in your car. Every one or two weeks, or three weeks if you’re driving a hybrid, you’ll change what actually goes in and how the algorithm operates on that data.
For the most part, data refreshes are a very common thing. We try to be very careful about how we safety check them. Some data refreshes happen all the time. For example, we compute page rank continually and continuously. There’s always a bank of machines refining page rank based on incoming data, and page rank goes out all the time, any time there’s a new update within our index, which happens pretty much every day. By contrast, some algorithms are updated every week, every couple weeks, so those are data refreshes that happen on a slower pace.
So the particular algorithm that people are interested in on June 27th and July 27th, that particular algorithm has actually been alive for over a year and a half now. It’s data refreshes that you’re seeing changing the way people’s sites rank. In general, if your site has been affected, go back and take a fresh look and see if there’s anything that might be exceedingly over optimized, or “Maybe I’ve been hanging out on SEO forums for such a long time that I need to have a regular person come in and take a look at the site and see if it looks okay.” If you’ve tried all the regular stuff and it still looks okay to you, then I would just keep building regular good content, try to make the site very useful, and if the site is useful then Google should fight hard to make sure it ranks where it should be ranking.
That’s about the most advice I can give about June 27th and July 27th data refreshes because it does go into our secret sauce a little bit. But that hopefully gives you a little bit of an idea about the scale, the magnitude of different changes. Algorithm changes happen a little more rarely, but data refreshes are always happening, and sometimes they happen from day to day and sometimes they happen from week to week or month to month.